S2CID (identifier)
   HOME

TheInfoList



OR:

Semantic Scholar is an
artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech re ...
–powered research tool for scientific literature developed at the
Allen Institute for AI The Allen Institute for AI (abbreviated AI2) is a research institute founded by late Microsoft co-founder Paul Allen. The institute seeks to achieve scientific breakthroughs by constructing AI systems with reasoning, learning, and reading capabi ...
and publicly released in November 2015. It uses advances in
natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
to provide summaries for scholarly papers. The Semantic Scholar team is actively researching the use of artificial-intelligence in
natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
,
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
, Human-Computer interaction, and
information retrieval Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other co ...
. Semantic Scholar began as a database surrounding the topics of
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
,
geoscience Earth science or geoscience includes all fields of natural science related to the planet Earth. This is a branch of science dealing with the physical, chemical, and biological complex constitutions and synergistic linkages of Earth's four sphere ...
, and
neuroscience Neuroscience is the scientific study of the nervous system (the brain, spinal cord, and peripheral nervous system), its functions and disorders. It is a multidisciplinary science that combines physiology, anatomy, molecular biology, development ...
. However, in 2017 the system began including biomedical literature in its corpus. As of September 2022, they now include over 200 million publications from all fields of science.


Technology

Semantic Scholar provides a one-sentence summary of
scientific literature : ''For a broader class of literature, see Academic publishing.'' Scientific literature comprises scholarly publications that report original empirical and theoretical work in the natural and social sciences. Within an academic field, scie ...
. One of its aims was to address the challenge of reading numerous titles and lengthy abstracts on mobile devices. It also seeks to ensure that the three million scientific papers published yearly reach readers, since it is estimated that only half of this literature are ever read. Artificial intelligence is used to capture the essence of a paper, generating it through an "abstractive" technique. The project uses a combination of
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
,
natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
, and
machine vision Machine vision (MV) is the technology and methods used to provide imaging-based automatic inspection and analysis for such applications as automatic inspection, process control, and robot guidance, usually in industry. Machine vision refers to ...
to add a layer of semantic analysis to the traditional methods of
citation analysis Citation analysis is the examination of the frequency, patterns, and graphs of citations in documents. It uses the directed graph of citations — links from one document to another document — to reveal properties of the documents. A t ...
, and to extract relevant figures,
tables Table may refer to: * Table (furniture), a piece of furniture with a flat surface and one or more legs * Table (landform), a flat area of land * Table (information), a data arrangement with rows and columns * Table (database), how the table data ...
, entities, and venues from papers. In contrast with
Google Scholar Google Scholar is a freely accessible web search engine that indexes the full text or metadata of scholarly literature across an array of publishing formats and disciplines. Released in beta in November 2004, the Google Scholar index includes p ...
and
PubMed PubMed is a free search engine accessing primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine (NLM) at the National Institutes of Health maintain the ...
, Semantic Scholar is designed to highlight the most important and influential elements of a paper. The AI technology is designed to identify hidden connections and links between research topics. Like the previously cited search engines, Semantic Scholar also exploits graph structures, which include the Microsoft Academic Knowledge Graph, Springer Nature's
SciGraph SciGraph is a search engine tool developed by Springer Nature. The technology, which is considered a Linked Open Data (LOD) platform, collects information that covers the research landscape, which includes research projects, publications, conferen ...
, and the Semantic Scholar Corpus. Each paper hosted by Semantic Scholar is assigned a unique
identifier An identifier is a name that identifies (that is, labels the identity of) either a unique object or a unique ''class'' of objects, where the "object" or class may be an idea, physical countable object (or class thereof), or physical noncountable ...
called the Semantic Scholar Corpus ID (abbreviated S2CID). The following entry is an example: :: Semantic Scholar is free to use and unlike similar search engines (i.e.
Google Scholar Google Scholar is a freely accessible web search engine that indexes the full text or metadata of scholarly literature across an array of publishing formats and disciplines. Released in beta in November 2004, the Google Scholar index includes p ...
) does not search for material that is behind a
paywall A paywall is a method of restricting access to content, with a purchase or a paid subscription, especially news. Beginning in the mid-2010s, newspapers started implementing paywalls on their websites as a way to increase revenue after years of ...
. One study compared the search abilities of Semantic Scholar through a systematic approach, and found the search engine to be 98.88% accurate when attempting to uncover the data. The same study examined other Semantic Scholar functions, including tools to survey
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
as well as several citation tools.


Number of users and publications

As of January 2018, following a 2017 project that added biomedical papers and topic summaries, the Semantic Scholar corpus included more than 40 million papers from
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
and
biomedicine Biomedicine (also referred to as Western medicine, mainstream medicine or conventional medicine)
. In March 2018, Doug Raymond, who developed
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
initiatives for the
Amazon Alexa Amazon Alexa, also known simply as Alexa, is a virtual assistant technology largely based on a Polish speech synthesiser named Ivona, bought by Amazon in 2013. It was first used in the Amazon Echo smart speaker and the Echo Dot, Echo Studio ...
platform, was hired to lead the Semantic Scholar project. As of August 2019, the number of included papers metadata (not the actual PDFs) had grown to more than 173 million after the addition of the
Microsoft Academic Graph Microsoft Academic was a free internet-based academic search engines for academic publications and literature, developed by Microsoft Research, shut down in 2022. At the same time, OpenAlex launched and claimed to be a successor to Microsoft Ac ...
records. In 2020, a partnership between Semantic Scholar and the University of Chicago Press Journals made all articles published under the University of Chicago Press available in the Semantic Scholar corpus. At the end of 2020, Semantic Scholar had indexed 190 million papers. In 2020, users of Semantic Scholar reached seven million a month.


See also

* * * *
List of academic databases and search engines This article contains a representative list of notable databases and search engines useful in an academic setting for finding and accessing articles in academic journals, institutional repositories, archives, or other collections of scientific and ...
*


References


External links

* {{Authority control Bibliographic databases in computer science Scholarly search services Applications of artificial intelligence